Linking LTAG with LTIG

نویسنده

  • Ali Basirat
چکیده

Lexicalized Tree-Adjoining Grammar (LTAG) is a tree-generating system that forms a language by a set of derived trees. LTAG is a Mildly Context Sensitive Grammar (MCSG). The elementary units of rewriting in this formalism are basic syntactic structures, which are represented in the form of trees, called elementary trees. Elementary trees are combined by substitution and adjunction operations to form derived/derivation trees. Each each elementary tree is associated with a word of language, called the anchor, and defines a syntactic environment in which the anchor can appear. LTAGs are famous for their ability in representing the structural descriptions of syntactic phenomena (Kroch and Joshi, 1985; Frank, 2004). The power of this formalism is the direct result of the way it factors out recursions and dependencies from the domain of locality. Moreover, the virtue of having the extended domain of locality allows the grammar to impose syntactic and semantic constraints on the relevant arguments of the same elementary structure. Complexity of parsing is an obstacle to real-world applications of LTAGs (Satta, 1994). In the worst case, the TAG parsing takes O(n) time and O(n) space to parse a sentence of length n. Researchers try to improve the performance of parsing in TAG by limiting the robustness of the formalism. Tree-Insertion Grammar (TIG) is a limited variant of TAG which combines the efficiency of ContextFree Grammars (CFG) with the lexicalizing power of the TAG formalism (Schabes and Waters, 1995). The main difference between TIG and TAG is the adjunction restriction of TIG, which does not allow it to generate context-sensitive languages. The language generated by a TIG is a context-free language whose sentences can be parsed in O(n) time. Here, I propose to build a statistical model to combine the advantages of LTAGs with LTIGs. The research will focus on two specific grammar for English, the English XTAG grammar (XTAG-Group, 2001), and the automatically extracted LTIG used by the MICA parser (Bangalore et al., 2009). The former grammar is a well-known hand-crafted LTAG developed by the XTAG Research Group at the University of Pennsylvania. The elementary trees of the XTAG grammar are annotated with feature structures which serve to make the grammar independent of a certain textual domain The latter grammar is a data-driven LTIG, which has been automatically extracted from Penn Treebank using Chen’s grammar extraction method (Chen, 2001). The grammar contain a large number of elementary trees and statistical information about their occurrence in syntactic environments. Unlike the XTAG grammar, the MICA grammar is dependent on the domain on which it was trained. Nonetheless, it can provide suitable coverage of the syntactic phenomena of its training corpus, the Penn Treebank. I will use the synchronous context-free grammar (SCFG) (Aho and Ullman, 1969) to model the interrelationships between the grammars. The two main reasons for this choice are as follows. First, the problem of TAG linking can be seen as a translation task in which a sequence of elementary trees of a source TAG is translated into a sequence of elementary trees of another target TAG. The source and the target elementary tree sequences are assumed to be assigned to the same sentence. Second, SCFGs are suitable for modeling long-distance dependencies between discontinuous MICA elementary tree sequences assigned to discontinuous multiword expressions. Multi word expressions (MWE) are lexical items that are made up of multiple simplex words (e.g., nominal compounds, idioms, and verbparticle constructions) (Kim and Baldwin, 2010). The synchronous context-free grammar is expected to be able to associate appropriate multi-anchor XTAG elementary trees (i.e., the elementary trees anchored in more than one word) with both continuous and discontinuous sequences of MICA elementary trees assigned to multiword expressions. The multi-anchor elementary trees in the English XTAG grammar model syntactic environments for the multiword expressions (MWEs) in English.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Argument linking in LTAG: A constraint-based implementation with XMG

This paper develops a first systematic approach to argument linking in LTAG, building on typologically oriented work in Van Valin (2005). While Van Valin's argument linking mechanism is procedurally defined, we propose a constraint-based implementation. The advantage is that we can separate between the linguistic generalizations to be captured and algorith-mic considerations. The implementation...

متن کامل

A Frame-Based Semantics of Locative Alternation in LTAG

In this paper we present an analysis of locative alternation phenomena in Russian and English within a frame-based LTAG syntax-semantics interface. The combination of a syntactic theory with an extended domain of locality and frames provides a powerful mechanism for argument linking. Furthermore, the concept of tree families and unanchored trees in LTAG allows for a decomposition of meaning int...

متن کامل

Constructing Linguistically Motivated Structures from Statistical Grammars

This paper discusses two Hidden Markov Models (HMM) for linking linguistically motivated XTAG grammar and the automatically extracted LTAG used by MICA parser. The former grammar is a detailed LTAG enriched with feature structures. And the latter one is a huge size LTAG that due to its statistical nature is well suited to be used in statistical approaches. Lack of an efficient parser and sparse...

متن کامل

LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing

Abstract. We introduce LTAG-spinal, a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. Unlike in traditional LTAG, subcategorization frames and the argument-adjunct distinction are left underspecified in LTAG-spinal. LTAG-spinal with adjunction constraints is weakly equivalent to LTAG. The LTAG-spinal for...

متن کامل

Tree Insertion Grammar: A Cubic-Time Parsable Formalism That Lexicalizes Context-Free Grammar Without Changing the Trees Produced

Tree insertion grammar (TIG) is a tree-based formalism that makes use of tree substitution and tree adjunction. TIG is related to tree adjoining grammar. However, the adjunction permitted in TIG is su ciently restricted that TIGs only derive context free languages and TIGs have the same cubic-time worst-case complexity bounds for recognition and parsing as context free grammars. An e cient Earl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014